Introduction

In this document, we will outline the Bayesian analogs of the statistical analyses described in lecture 1 (Github code).

Note on MathJax

If you use RStudio’s Visual editor mode. Check if you can see anything written in the following parenthesis: (\(k\)). If not, your RStudio has a bug. This problem will be fixed in future versions. Meanwhile, if some text seems to be missing in the file below, simply position your cursor around the missing area to reveal the underlying code.

Load packages

library(dplyr)
library(tibble)
library(purrr)
library(patchwork)
library(tidyr)
library(forcats)
library(gtools)
library(broom)
library(broom.mixed)
library(modelr)
library(brms)
library(tidybayes)
library(ggdist)
library(bayesplot)
library(ggplot2)
library(knitr)
BRM_BACKEND <- ifelse(require("cmdstanr"), 'cmdstanr', 'rstan')

Dataset

The following dataset is from experiment 2 of “How Relevant are Incidental Power Poses for HCI?” (Jansen & Hornbæk, 2018). Study participants were asked to either make an expansive posture or a constrictive posture before performing a task. The experiment investigated whether posture could potentially have an effect on risk taking behavior.

First, we load the data.

pose_df = readr::read_csv("data/poses_data.csv", show_col_types = FALSE) %>%
  mutate(condition) %>%
  group_by(participant)

The data has been aggregated for each participant: - condition = expansive indicates expansive posture, and condition = constrictive indicates constrictive posture - The dependent variable is change which indicates the percentage change in risk-taking behavior. Thus, it is a continuous variable.

For the purposes of this demo, we are only concerned with these two variables. We can ignore the other variables for now.

head(pose_df %>% select(participant, condition, change))
participant condition change
1 expansive 3.362832
2 constrictive 29.147982
3 expansive 25.409836
4 constrictive 54.069767
5 expansive -36.644592
6 constrictive 29.756098

Intuition of Bayesian Statistics

The Bayesian t-test (BEST) assumes that the data in the two conditions arises from two separate t-distributions. In the following section, we will describe the process for one of the conditions in the experiment.

We will use the \(Normal(\mu = 20, \sigma = 20)\) as the prior distribution.

First, we define some functions for manual calculation of the posterior normal distribution:

sigma_post = function(sigma_prior, sigma, n = 1) {
  sqrt(1 / (1 / (sigma_prior^2) + n / (sigma^2)))
}
mu_post = function(mu_prior, sigma_prior, mu, sigma, n = 1) {
  tau = sigma_post(sigma_prior, sigma, n)
  (tau^2 / sigma_prior^2)*mu_prior + (n * tau^2 / sigma^2)*mu
}
d.p2 = tibble(
  group = c("prior", "expansive", "constrictive", "posterior"), 
  mu = c(20, 32.82, 31.61, mu_post(20, 20, 32.82, 7.52)), 
  sd = c(20, 7.52, 7.06, sigma_post(20, 7.52))
) %>%
  mutate(
    cutoff_group = list(c(1:7)),
    cutoff = list(c(0, 15, 25, 30, 32.82, 40, 100))
  ) %>%
  unnest(c(cutoff_group, cutoff)) %>%
  mutate(
    cutoff = if_else(group == 'prior', 100, cutoff),
    x = map(cutoff, ~ seq(from = -40, ., by = 0.1)),
    y = pmap(list(x, mu, sd), ~ dnorm(..1, ..2, ..3))
  )

In the plot below, we show the raw data distribution for the two conditions:

p1 = pose_df %>%
  ggplot() +
  geom_point(aes(x = change, y = condition, colour = condition), 
             position = position_jitter(height = 0.1), alpha = 0.7) +
  scale_color_theme() + 
  labs(y = "Condition") +
  theme(
    legend.position = "none", 
    axis.line.y = element_blank(), 
    axis.ticks.y = element_blank(),
    axis.title.y = element_blank()
  ) +
  scale_x_continuous(limits = c(-150, 250), breaks = seq(-150, 250, by = 50))

p2.blank = tibble(y = c("expansive", "constrictive"), x = 0) %>%
  ggplot(aes(x, y)) +
  scale_color_theme() + 
  theme_density

cowplot::plot_grid(p2.blank, p1, nrow = 2)

First, we define a function to help us plot different cutoff groups.

plot_preliminary <- function(data, group_num, group_name){
  data %>%
  filter(cutoff_group == group_num & group %in% group_name) %>%
  unnest(c(x, y)) %>%
  ggplot(aes(x, y)) +
  #geom_line(aes(color = group), size = 1) +
  # density
  geom_area(aes(fill = group, color = group), position = "identity", linewidth = 1, alpha = 0.3) +
  scale_x_continuous(limits = c(-40, 100)) +
  scale_y_continuous(limits = c(0, 0.1)) +
  scale_color_theme() +
  theme_density
}
cowplot::plot_grid(
  plot_preliminary(d.p2, 0, NULL), 
  p1, nrow = 2)

Next, we plot the prior density:

cowplot::plot_grid(
  plot_preliminary(d.p2, 1, 'prior'), 
  p1, nrow = 2)

Then we describe step by step, how the likelihood is computed:

cowplot::plot_grid(
  plot_preliminary(d.p2, 1, 'expansive'), 
  p1, nrow = 2)

cowplot::plot_grid(
  plot_preliminary(d.p2, 2, 'expansive'), 
  p1, nrow = 2)

cowplot::plot_grid(
  plot_preliminary(d.p2, 3, 'expansive'), 
  p1, nrow = 2)

cowplot::plot_grid(
  plot_preliminary(d.p2, 5, 'expansive'), 
  p1, nrow = 2)

cowplot::plot_grid(
  plot_preliminary(d.p2, 6, 'expansive'), 
  p1, nrow = 2)

cowplot::plot_grid(
  plot_preliminary(d.p2, 7, 'expansive'), 
  p1, nrow = 2)

We want to compute the posterior, which is the product of the prior and likelihood:

cowplot::plot_grid(
  plot_preliminary(d.p2, 7, c('prior', 'expansive', 'posterior')), 
  p1, nrow = 2)

Of course, if you have gganimate and magick working on your computer, you can try the following code to get a gif! Here, we are not going to show the code.

Fumeng: the content below needs better documentation….

Model 1: Equal standard deviations

Step 1: model specification

Fumeng: I hope to keep this example simple. Is there a simple way to justify the choice of \(\nu\)?

Here we use a mathematical expression for the model above. Fumeng: need explanations here…

\[ \begin{align} y_{i} &\sim \mathrm{Student\_t}(\mu, \sigma_{0}, \nu_{0}) \\ \mu &= \beta_{0} + \beta_{1} * x_i \\ \sigma_{0} &\sim \mathrm{HalfNormal}(0, 10) \\ \beta_{0} &\sim \mathrm{?} \\ \beta_{1} &\sim \mathrm{Normal}(0,2) \\ \nu_{0} &\sim \mathrm{?} \\ i & \in \{\mathrm{expansive}, \mathrm{constrictive}\} \end{align} \]

We translate this thought into brms formula using the function bf.

model.1.formula <- bf(
                      # we think change is affected by different conditions.
                      change ~ condition,
                      # to tell brms which response distribution to use
                      family = student()
                    )

Now we can use get_prior() to inspect the available priors and formula. As what we learned, we have priors for

tibble(get_prior(model.1.formula, pose_df))
prior class coef group resp dpar nlpar lb ub source
b default
b conditionexpansive default
student_t(3, 22.6, 38.8) Intercept default
gamma(2, 0.1) nu 1 default
student_t(3, 0, 38.8) sigma 0 default

prior checks

Look at the default priors

cowplot::plot_grid(
tibble(x = qstudent_t(ppoints(n = 500), df = 3, mu = 22.6, sigma = 38.8)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('student_t(3, 22.6, 38.8) for Intercept') +
  coord_cartesian(xlim = c(-300, 300),expand = c(0)),
tibble(x = qgamma(ppoints(n = 500), shape = 2, rate = .1)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('Gamma(2, 0.1) for nu') +
  coord_cartesian(xlim = c(1, 100), expand = 0),
tibble(x = qstudent_t(ppoints(n = 500), df = 3, mu = 0, sigma = 38.8)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('student_t(3, 0, 38.8) for sigma') +
  coord_cartesian(xlim = c(0, 400),expand = c(0)),
ncol = 3)

prior check for default priors

model.1.checks_default <- brm(
  model.1.formula,
  data = pose_df, 
  family = student_t(),
  prior = c(
    prior(normal(0, 3), class = 'b')
  ),
  sample_prior = 'only',
  #backend = BRM_BACKEND,
  file = 'rds/model.1.checks_default.rds',
  file_refit = 'on_change' 
)

plot

model.1.defaultpriorsamples <- 
  model.1.checks_default %>% 
    epred_draws(tibble(condition = c('expansive', 'constrictive')))
head(model.1.defaultpriorsamples)
condition .row .chain .iteration .draw .epred
expansive 1 NA NA 1 4.919106
expansive 1 NA NA 2 25.242105
expansive 1 NA NA 3 15.514846
expansive 1 NA NA 4 61.385267
expansive 1 NA NA 5 98.523079
expansive 1 NA NA 6 -60.393775
model.1.defaultpriorsamples %>% 
  ggplot(aes(x = .epred, group = condition)) +
  geom_density(alpha = .5, color = NA, adjust = 2, fill = theme_yellow) +
  theme_density_x + 
  ggtitle('Checks default priors of model.1')

Our priors

cowplot::plot_grid(
tibble(x = qnorm(ppoints(n = 1000), mean = 0, sd = 2)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('Normal(0, 2) for b & Intercept') +
  coord_cartesian(xlim = c(-8, 8), expand = 0),
tibble(x = qnorm(ppoints(n = 1000), mean = 0, sd = 10)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('HalfNormal(0, 10) for sigma') +
  coord_cartesian(xlim = c(0, 50), expand = c(0)),
ncol = 2)

model.1.checks <- brm(
  model.1.formula,
  data = pose_df, 
  family = student_t(),
  prior = c(
    prior(normal(0, 2), class = 'b'),
    prior(normal(0, 10), class = 'sigma', lb = 0)
  ),
  sample_prior = 'only',
  #backend = BRM_BACKEND,
  file = 'rds/model.1.checks.rds',
  file_refit = 'on_change' 
)

plot

model.1.priorsamples <- 
  model.1.checks %>% 
    epred_draws(tibble(condition = c('expansive', 'constrictive')))

Compare two sets of priors

cowplot::plot_grid(
  
model.1.defaultpriorsamples %>% 
  ggplot(aes(x = .epred, group = condition)) +
  geom_density(alpha = .5, color = NA, adjust = 2, fill = theme_yellow) +
  theme_density_x + 
  #coord_cartesian(xlim = c(-800, 800)) + 
  ggtitle('Checks default priors of model.1')
,
model.1.priorsamples %>% 
  ggplot(aes(x = .epred, group = condition)) +
  geom_density(alpha = .5, color = NA, adjust = 2, fill = theme_yellow) +
  theme_density_x + 
  #coord_cartesian(xlim = c(-800, 800)) + 
  ggtitle('Checks our priors of model.1')
,
ncol = 1)

Step 2: model fitting

model.1 <- brm(
  model.1.formula,
  data = pose_df, 
  family = student_t(),
  prior = c(
    prior(normal(0, 2), class = 'b'),
    prior(normal(0, 10), class = 'sigma', lb = 0)
  ),
  backend = BRM_BACKEND,
  file = 'rds/model.1.rds'
)

Step 3: check posteriors

aspect 1: mcmc traces

color_scheme_set("teal")
mcmc_trace(model.1, facet_args = list(ncol = 4))

aspect 2: model metrics

Rhat, ESS, etc here

summary(model.1)
##  Family: student 
##   Links: mu = identity; sigma = identity; nu = identity 
## Formula: change ~ condition 
##    Data: pose_df (Number of observations: 80) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             22.84      4.49    14.50    31.77 1.00     2686     2092
## conditionexpansive    -0.26      1.97    -4.10     3.62 1.00     2726     2987
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    28.35      4.13    20.76    36.92 1.00     2180     2520
## nu        3.68      2.35     1.57     9.04 1.00     2172     2158
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

aspect 3: visually

Fumeng: do we need this?

model.1.predictions <- 
  model.1 %>% 
    predicted_draws(tibble(condition = c('expansive', 'constrictive')))
head(model.1.predictions)
condition .row .chain .iteration .draw .prediction
expansive 1 NA NA 1 -0.6345021
expansive 1 NA NA 2 42.4771348
expansive 1 NA NA 3 -8.2274815
expansive 1 NA NA 4 -10.1393149
expansive 1 NA NA 5 18.9937838
expansive 1 NA NA 6 -35.5627088
plot_predictions <- function(model, df = NULL, title = ''){
  
  if(is.null(df))
      df = tibble(condition = c('expansive', 'constrictive'), 
                           participant = c(-1, -1))
  model %>% 
    predicted_draws(df,
                    seed = 1234,
                    ndraws = NULL,
                    allow_new_levels = TRUE,
                    sample_new_levels = 'uncertainty') %>% 
    ggplot(aes(x = .prediction, fill = condition)) +
    geom_density(alpha = .5, size = 1, adjust = 2, color = NA) +
    scale_color_theme() +
    theme_density_x + 
    scale_y_continuous(breaks = 0, labels = 'constrictive') + 
    scale_x_continuous(breaks = seq(-150, 250, by = 50)) +
    coord_cartesian(xlim = c(-150, 250)) + 
    ggtitle(paste0('Posterior predictions of ', deparse(substitute(model))))
}
cowplot::plot_grid(plot_predictions(model.1), 
                   p1 + 
                    scale_x_continuous(breaks = seq(-150, 250, by = 100)) +
                    coord_cartesian(xlim = c(-150, 250)), 
                   nrow = 2)

model.1.posteriors <- 
  model.1 %>% 
    epred_draws(tibble(condition = c('expansive', 'constrictive')))
plot_posteriors <- function(model, df = NULL, title = ''){
  
  if(is.null(df))
     df = tibble(condition = c('expansive', 'constrictive'))
  model %>% 
    epred_draws(df, 
                # ignoring random effects if there is any
                #seed = 1234,
                ndraws = NULL,
                re_formula = NA) %>% 
    ggplot(aes(x = .epred, fill = condition)) +
    geom_density(alpha = .5, size = 1, adjust = 2, color = NA) +
    scale_color_theme() +
    theme_density_x + 
    scale_y_continuous(breaks = 0, labels = 'constrictive') + 
    scale_x_continuous(limits = c(-150, 250), breaks = seq(-150, 250, by = 50)) +
    ggtitle(paste0('Posterior means of ', deparse(substitute(model))))
}
cowplot::plot_grid(plot_posteriors(model.1), p1, nrow = 2)

Model 2: the BEST test model

Step 1: model specification

This model is the BEST test model as described by Kruschke in the paper Bayesian estimation supersedes the t-test. In this model, \(\beta\) indicates the mean difference in the outcome variable between the two groups (in this case, the percent change in the BART scores). We fit different priors on \(\beta\) and set different weights on these priors to obtain our posterior estimate.

\[ \begin{align} y_{i} &\sim \mathrm{T}(\nu, \mu, \sigma) \\ \mu &= \beta_{0} + \beta_{1} * x_i \\ \sigma &= \sigma_{a} + \sigma_{b}*x_i \\ \beta_{1} &\sim \mathrm{Normal}(\mu_{0}, \sigma_{0}) \\ \sigma_a, \sigma_b &\sim \mathrm{Cauchy}(0, 2) \\ \nu &\sim \mathrm{exp}(1/30)\\ i & \in \{\mathrm{expansive}, \mathrm{constrictive}\} \end{align} \]

model.2.formula <- bf(# we think change is affected by different conditions.
                      change ~ condition,
                      sigma ~ condition,
                      # to tell brms which response distribution to use
                      family = student())
tibble(get_prior(model.2.formula, pose_df))
prior class coef group resp dpar nlpar lb ub source
b default
b conditionexpansive default
student_t(3, 22.6, 38.8) Intercept default
gamma(2, 0.1) nu 1 default
b sigma default
b conditionexpansive sigma default
student_t(3, 0, 2.5) Intercept sigma default

prior check

Fumeng: not sure qcauchy has the correct paramaterization

cowplot::plot_grid(
  tibble(x = qnorm(ppoints(n = 1000),  mean = 0, sd = 2)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('Normal(0 ,2) for Intercept') +
  coord_cartesian(expand = c(0)),
tibble(x = qexp(ppoints(n = 1000), rate = 0.0333)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('exponential(0.0333) for nu') +
  coord_cartesian(expand = 0),
tibble(x = qcauchy(ppoints(n = 1000), location = 0, scale = 2)) %>% 
  ggplot() +
  geom_density(aes(x = x), fill = theme_yellow, color = NA) +
  ggtitle('cauchy(0, 2) for sigma') +
  coord_cartesian(expand = c(0)),
ncol = 3)

model.2.priorchecks <- brm(
  model.2.formula,
  data = pose_df, 
  family = student_t(),
  prior = c(
    prior(normal(0, 2), class = 'b'),
    prior(cauchy(0, 2), class = 'b', dpar = 'sigma'),
    prior(exponential(0.0333), class = 'nu')
  ),
  sample_prior = 'only',
  #backend = BRM_BACKEND,
  file = 'rds/model.2.priorchecks.rds',
  file_refit = 'on_change' 
)
model.2.priorchecks %>% 
 epred_draws(tibble(condition = c('expansive', 'constrictive'))) %>% 
  ggplot(aes(x = .epred, group = condition)) +
  geom_density(alpha = .5, color = NA, adjust = 2, fill = theme_yellow) +
  theme_density_x + 
  ggtitle('Checks priors of model.2')

Step 2: model fiting

model.2 <- brm(
  model.2.formula,
  data = pose_df, 
  family = student_t(),
  prior = c(
    prior(normal(0, 2), class = 'b'),
    prior(cauchy(0, 2), class = 'b', dpar = 'sigma'),
    prior(exponential(0.0333), class = 'nu')
  ),
  backend = BRM_BACKEND,
  file = 'rds/model.2.rds',
  file_refit = 'on_change' 
)

Step 3: posterior checks

summary(model.2)
##  Family: student 
##   Links: mu = identity; sigma = log; nu = identity 
## Formula: change ~ condition 
##          sigma ~ condition
##    Data: pose_df (Number of observations: 80) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                   24.56      4.99    15.10    34.76 1.00     2787
## sigma_Intercept              3.41      0.20     3.00     3.79 1.00     2307
## conditionexpansive          -0.11      1.99    -4.11     3.76 1.00     3769
## sigma_conditionexpansive     0.15      0.21    -0.27     0.57 1.00     3783
##                          Tail_ESS
## Intercept                    2788
## sigma_Intercept              2293
## conditionexpansive           2642
## sigma_conditionexpansive     2778
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## nu     6.00      7.42     1.72    23.39 1.00     1910     1876
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
wrap_plots(plot_predictions(model.2) + scale_x_continuous(limits = c(-200,250), expand = c(0,0), breaks = seq(-150, 250, by = 50)) 
                   , 
                   p1 + scale_x_continuous(limits = c(-200,250), expand = c(0,0), breaks = seq(-150, 250, by = 50)) , 
                   nrow = 2)

# draws.model.3 = 
wrap_plots(
  plot_posteriors(model.2), 
  p1,
nrow = 2)

model.2.posteriors <- model.2 %>% 
                epred_draws(tibble(condition = c('expansive', 'constrictive')), 
                #seed = 1234,
                ndraws = NULL,
                re_formula = NA) 

Model 3: Poisson / Negative-Binomial Regression model

Step 1: model specification

\[ \begin{align} y_{i} &\sim \mathrm{Neg-Binomial}(\mu, \phi) \\ \mu &= \beta_{i,j} + \beta_{1} * x_i \\ \beta_{1} &\sim \mathrm{Normal}(0, 2) \\ i & \in \{\mathrm{expansive}, \mathrm{constrictive}\}\\ j & \in \{1, ..., \mathrm{N}\} \end{align} \]

pose_raw_df = read.csv("data/posture_data-raw.csv") %>%
  mutate(participant = factor(participant)) %>%
  rename(trial = trial.number)

head(pose_raw_df)
participant condition total.money trial trial.money exploded pumps life
1 expansive 910 0 62 0 62 72
1 expansive 910 1 0 1 27 27
1 expansive 910 2 47 0 47 98
1 expansive 910 3 0 1 86 86
1 expansive 910 4 60 0 60 104
1 expansive 910 5 0 1 26 26
model.3= brm(pumps ~ 1 + condition + trial + (trial | participant),
                      data = pose_raw_df, family = negbinomial,
                      prior = c(prior(normal(0, 10), class = Intercept),
                                prior(normal(0, 1), class = b),
                                prior(gamma(0.01, 0.01), class = shape)
                      ),  # this is the brms default
                      file = 'rds/model.3.rds',
                      file_refit = 'on_change' ,
                      iter = 4000, warmup = 1000, cores = 4, chains = 4)
summary(model.3)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: pumps ~ 1 + condition + trial + (trial | participant) 
##    Data: pose_raw_df (Number of observations: 2400) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~participant (Number of levels: 80) 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)            0.37      0.04     0.31     0.45 1.00     3834
## sd(trial)                0.01      0.00     0.00     0.01 1.00     3766
## cor(Intercept,trial)    -0.62      0.12    -0.82    -0.35 1.00     9335
##                      Tail_ESS
## sd(Intercept)            6032
## sd(trial)                4648
## cor(Intercept,trial)     8296
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept              3.52      0.06     3.41     3.63 1.00     2896     5345
## conditionexpansive    -0.04      0.07    -0.17     0.10 1.00     3288     5424
## trial                  0.01      0.00     0.00     0.01 1.00     6928     8176
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     7.66      0.28     7.14     8.22 1.00    17558     8043
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Step 2: model interpretation

The results of this model are on the log-odds scale. What do the coefficients mean? The simplest way is to simply transform the data into a more interpretable scale and visualise the results:

# draws.model.3 = 
wrap_plots(
  plot_predictions(model.3, df = crossing(condition = c('expansive', 'constrictive'), 
                                          trial = 29,
                                          participant = 0)), 
  p1,
nrow = 2)

# draws.model.3 = 
wrap_plots(
  plot_posteriors(model.3, df = tibble(condition = c('expansive', 'constrictive'), 
                                       trial = 29,
                                       participant = 0)), 
  p1,
nrow = 2)

model.3.posteriors <-  model.3 %>% 
    epred_draws(tibble(condition = c('expansive', 'constrictive'),     
                       trial = 29,
                       participant = 0),
                allow_new_levels  = TRUE) %>% 
  mutate(model = 'model 3') 

From this, we can see that there does not appear to be a difference between the two conditions.

Model 4: the SAYA model

SAYA = SArma + YAng

Fumeng: do we want to get into random intercepts?

Step 1: model specification

\[ \begin{align} y_{i} &\sim \mathrm{T}(\nu, \mu, \sigma) \\ \mu &= \beta_{i,j} + \beta_{1} * x_i \\ \sigma &= \sigma_{a} + \sigma_{b}*x_i \\ \beta_{1} &\sim \mathrm{Normal}(0, 2) \\ \sigma_a, \sigma_b &\sim \mathrm{HalfNormal}(0, 10) \\ \nu &\sim \mathrm{exp}(1/30)\\ i & \in \{\mathrm{expansive}, \mathrm{constrictive}\}\\ j & \in \{1, ..., \mathrm{N}\} \end{align} \]

model.4.formula <- bf(# we think change is affected by different conditions.
                      change ~ condition + (1|participant),
                      sigma ~ condition,
                      # to tell brms which response distribution to use
                      family = student())
tibble(get_prior(model.4.formula, pose_df))
prior class coef group resp dpar nlpar lb ub source
b default
b conditionexpansive default
student_t(3, 22.6, 38.8) Intercept default
gamma(2, 0.1) nu 1 default
student_t(3, 0, 38.8) sd 0 default
sd participant default
sd Intercept participant default
b sigma default
b conditionexpansive sigma default
student_t(3, 0, 2.5) Intercept sigma default

Step 2: model fiting

model.4 <- brm(
  model.4.formula,
  data = pose_df, 
  family = student_t(),
  prior = c(
    prior(normal(0, 2), class = 'b'),
    prior(normal(0, 2), class = 'b', dpar = 'sigma'),
    prior(normal(0, 2), class = 'sd', group = 'participant', lb = 0),
    prior(exponential(0.0333), class = 'nu')
  ),
  #backend = BRM_BACKEND,
  file = 'rds/model.4.rds',
  file_refit = 'on_change' 
)

Step 3: posterior checks

summary(model.4)
##  Family: student 
##   Links: mu = identity; sigma = log; nu = identity 
## Formula: change ~ condition + (1 | participant) 
##          sigma ~ condition
##    Data: pose_df (Number of observations: 80) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~participant (Number of levels: 80) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.58      1.20     0.07     4.40 1.00     3406     2185
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                   24.38      5.00    15.24    34.67 1.00     6113
## sigma_Intercept              3.40      0.20     2.98     3.79 1.00     5980
## conditionexpansive          -0.10      1.88    -3.87     3.69 1.01     8773
## sigma_conditionexpansive     0.15      0.22    -0.28     0.59 1.00     8215
##                          Tail_ESS
## Intercept                    3424
## sigma_Intercept              3032
## conditionexpansive           2856
## sigma_conditionexpansive     2495
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## nu     5.78      6.79     1.68    22.11 1.00     4474     2774
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
cowplot::plot_grid(plot_predictions(model.4), p1, nrow = 2)

cowplot::plot_grid(plot_posteriors(model.4), p1, nrow = 2)

model.4.posteriors <-
  model.4 %>% 
  epred_draws(tibble(condition = c('expansive', 'constrictive')),
               re_formula = NA)
wrap_plots(
model.4.posteriors %>% 
  mutate(model = 'model 4') %>% 
    rbind(
    model.3.posteriors %>% 
      mutate(model = 'model 3'))  %>% 
  rbind(
    model.2.posteriors %>% 
      mutate(model = 'model 2'))  %>% 
  rbind(
    model.1.posteriors %>% 
      mutate(model = 'model 1')) %>% 
ggplot() + 
  geom_density(aes(x = .epred, fill = condition), adjust = 1.5, color = NA, alpha = .5) +
  facet_grid(model ~ .) +
  scale_x_continuous(limits = c(-150, 250), breaks = seq(-150, 250, by = 50)) +
  scale_color_theme() +
  theme_density_x +
  ggtitle('Compare the means of all four models'),
p1, nrow = 2, heights = c(4,1.5))

Reporting

Let’s use model 4

model

summary(model.4)
##  Family: student 
##   Links: mu = identity; sigma = log; nu = identity 
## Formula: change ~ condition + (1 | participant) 
##          sigma ~ condition
##    Data: pose_df (Number of observations: 80) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~participant (Number of levels: 80) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.58      1.20     0.07     4.40 1.00     3406     2185
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                   24.38      5.00    15.24    34.67 1.00     6113
## sigma_Intercept              3.40      0.20     2.98     3.79 1.00     5980
## conditionexpansive          -0.10      1.88    -3.87     3.69 1.01     8773
## sigma_conditionexpansive     0.15      0.22    -0.28     0.59 1.00     8215
##                          Tail_ESS
## Intercept                    3424
## sigma_Intercept              3032
## conditionexpansive           2856
## sigma_conditionexpansive     2495
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## nu     5.78      6.79     1.68    22.11 1.00     4474     2774
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

CI

model.4.posteriors.CI <- 
model.4.posteriors %>% 
  group_by(condition) %>% 
  median_qi(.epred, width = .95)
model.4.posteriors %>% 
ggplot() +
  geom_density(aes(x = .epred, fill = condition), alpha = .5, adjust = 2, color = NA) +
  geom_point(model.4.posteriors.CI, 
             mapping = aes(x = .epred, y = 0), size = 3) + 
  geom_errorbarh(model.4.posteriors.CI, 
                 mapping = aes(x = .epred, xmin = .epred.lower, xmax = .epred.upper, y = 0), height = 0, linewidth = 1.5) + 
  facet_wrap(condition ~ ., ncol = 1) + 
  scale_x_continuous(limits = c(0, 50)) + 
  scale_y_continuous(expand = c(.02,.02)) + 
  scale_color_theme() +
  theme_density_x 

subtraction

model.4.posteriors_diff <- 
model.4.posteriors %>% 
  compare_levels(variable = .epred, by = condition) %>% 
  select(-.chain, -.iteration) %>% 
  ungroup()
head(model.4.posteriors_diff)
.draw condition .epred
1 expansive - constrictive -1.8591500
2 expansive - constrictive 1.6581828
3 expansive - constrictive -1.0066992
4 expansive - constrictive 5.0253065
5 expansive - constrictive 1.0634401
6 expansive - constrictive 0.0398311
model.4.posteriors_diff.CI <-
model.4.posteriors_diff %>% 
  median_qi(.epred)

model.4.posteriors_diff.CI
.epred .lower .upper .width .point .interval
-0.0867433 -3.866209 3.691605 0.95 median qi
model.4.posteriors_diff %>% 
ggplot() +
  geom_density(aes(x = .epred), alpha = .5,  fill = theme_blue, color = NA, adjust = 2) +
  geom_point(model.4.posteriors_diff.CI, 
             mapping = aes(x = .epred, y = 0), size = 3) + 
  geom_errorbarh(model.4.posteriors_diff.CI, 
                 mapping = aes(x = .epred, xmin = .lower, xmax = .upper, y = 0), height = 0, linewidth = 1.5) + 
  #scale_x_continuous(limits = c(-50, 50)) + 
  xlab('expansive - constrictive') +
  ggtitle('Mean difference in expansive and constrictive') + 
  geom_vline(xintercept = 0, linetype = 2) + 
  scale_y_continuous(expand = c(.02,.02)) + 
  scale_x_continuous(limits = c(-10, 10), breaks = seq(-10, 10, by = 5)) + 
  scale_color_theme() +
  theme_density_x 

Session info

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] magick_2.7.3        gganimate_1.0.8     cmdstanr_0.5.2     
##  [4] knitr_1.39          ggplot2_3.4.0       bayesplot_1.9.0    
##  [7] ggdist_3.1.1        tidybayes_3.0.2     brms_2.17.0        
## [10] Rcpp_1.0.8.3        modelr_0.1.8        broom.mixed_0.2.9.4
## [13] broom_1.0.0         gtools_3.9.2.1      forcats_0.5.1      
## [16] tidyr_1.2.0         patchwork_1.1.2     purrr_0.3.4        
## [19] tibble_3.1.7        dplyr_1.0.9        
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-3     ellipsis_0.3.2       ggridges_0.5.3      
##   [4] markdown_1.1         base64enc_0.1-3      rstudioapi_0.13     
##   [7] listenv_0.8.0        furrr_0.3.1          farver_2.1.1        
##  [10] rstan_2.21.5         bit64_4.0.5          svUnit_1.0.6        
##  [13] DT_0.23              fansi_1.0.3          mvtnorm_1.1-3       
##  [16] diffobj_0.3.5        bridgesampling_1.1-2 codetools_0.2-18    
##  [19] splines_4.2.2        shinythemes_1.2.0    jsonlite_1.8.0      
##  [22] shiny_1.7.1          readr_2.1.2          compiler_4.2.2      
##  [25] backports_1.4.1      assertthat_0.2.1     Matrix_1.5-1        
##  [28] fastmap_1.1.0        cli_3.4.1            tweenr_1.0.2        
##  [31] later_1.3.0          htmltools_0.5.2      prettyunits_1.1.1   
##  [34] tools_4.2.2          igraph_1.3.1         coda_0.19-4         
##  [37] gtable_0.3.0         glue_1.6.2           reshape2_1.4.4      
##  [40] posterior_1.2.1      jquerylib_0.1.4      vctrs_0.5.1         
##  [43] nlme_3.1-160         crosstalk_1.2.0      tensorA_0.36.2      
##  [46] xfun_0.31            stringr_1.4.0        globals_0.15.0      
##  [49] ps_1.7.0             mime_0.12            miniUI_0.1.1.1      
##  [52] lifecycle_1.0.3      future_1.26.1        zoo_1.8-10          
##  [55] scales_1.2.0         vroom_1.5.7          colourpicker_1.1.1  
##  [58] hms_1.1.1            promises_1.2.0.1     Brobdingnag_1.2-7   
##  [61] parallel_4.2.2       inline_0.3.19        shinystan_2.6.0     
##  [64] yaml_2.3.5           gridExtra_2.3        loo_2.5.1           
##  [67] StanHeaders_2.21.0-7 sass_0.4.1           stringi_1.7.6       
##  [70] highr_0.9            dygraphs_1.1.1.6     gifski_1.6.6-1      
##  [73] checkmate_2.1.0      pkgbuild_1.3.1       rlang_1.0.6         
##  [76] pkgconfig_2.0.3      matrixStats_0.62.0   distributional_0.3.0
##  [79] evaluate_0.15        lattice_0.20-45      labeling_0.4.2      
##  [82] rstantools_2.2.0     htmlwidgets_1.5.4    cowplot_1.1.1       
##  [85] bit_4.0.4            tidyselect_1.1.2     processx_3.5.3      
##  [88] parallelly_1.31.1    plyr_1.8.7           magrittr_2.0.3      
##  [91] R6_2.5.1             generics_0.1.3       DBI_1.1.2           
##  [94] pillar_1.7.0         withr_2.5.0          xts_0.12.1          
##  [97] abind_1.4-5          crayon_1.5.1         arrayhelpers_1.1-0  
## [100] utf8_1.2.2           tzdb_0.3.0           rmarkdown_2.14      
## [103] progress_1.2.2       grid_4.2.2           callr_3.7.0         
## [106] threejs_0.3.3        digest_0.6.29        xtable_1.8-4        
## [109] httpuv_1.6.5         RcppParallel_5.1.5   stats4_4.2.2        
## [112] munsell_0.5.0        bslib_0.3.1          shinyjs_2.1.0